Module 11
NIST/SEMATECH e-Handbook of Statistical Methods
Current standard for working with dates: ISO8601
The original Date class stored dates as number of days since 1970-01-01
Stored as seconds since 1970-01-01
[1] 1704135600
List of 11
$ sec : num 0
$ min : int 0
$ hour : int 12
$ mday : int 1
$ mon : int 0
$ year : int 124
$ wday : int 1
$ yday : int 0
$ isdst : int 0
$ zone : chr "MST"
$ gmtoff: int NA
- attr(*, "tzone")= chr [1:3] "" "MST" "MDT"
- attr(*, "balanced")= logi TRUE
strptime() - Traditional parsing method
[1] "2003-01-02 MST"
[1] "2003-02-01 MST"
[1] "2003-01-02 MST"
as.Date() with format specification
[1] "C"
[1] "Mon Jan 01 2024"
Pre-ISO 8601, timezone handling was less standardized
Could be interpreted multiple ways:
The current ISO 8601 standard in R offers several key advantages:
Here, the date is still being read as a character string.
The read_csv() function from the readr package will automatically parse dates.
We can use the lubridate package to coerce the date column to a date object.
ggplot2 Recogizes DatesWe can then format them as needed.
An ordered sequence of values of a variable at equally spaced time intervals.
From fpp3:
Examples of time series data include:
Annual Google profits
Quarterly sales results for Amazon
Monthly rainfall
Weekly retail sales
Daily IBM stock prices
Hourly electricity demand
5-minute freeway traffic counts
Time-stamped stock transaction data
Anything that is observed sequentially over time is a time series.
ts() data structure (1/2) Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ...
Jan Feb Mar Apr May Jun Jul Aug Sep Oct
1959 315.42 316.31 316.50 317.56 318.13 318.00 316.39 314.65 313.68 313.18
1960 316.27 316.81 317.42 318.87 319.87 319.43 318.01 315.74 314.00 313.68
1961 316.73 317.54 318.38 319.31 320.42 319.61 318.42 316.63 314.83 315.16
1962 317.78 318.40 319.53 320.42 320.85 320.45 319.45 317.25 316.11 315.27
1963 318.58 318.92 319.70 321.22 322.08 321.31 319.58 317.61 316.05 315.83
1964 319.41 320.07 320.74 321.40 322.06 321.73 320.27 318.54 316.54 316.71
1965 319.27 320.28 320.73 321.97 322.00 321.71 321.05 318.71 317.66 317.14
1966 320.46 321.43 322.23 323.54 323.91 323.59 322.24 320.20 318.48 317.94
1967 322.17 322.34 322.88 324.25 324.83 323.93 322.38 320.76 319.10 319.24
1968 322.40 322.99 323.73 324.86 325.40 325.20 323.98 321.95 320.18 320.09
1969 323.83 324.26 325.47 326.50 327.21 326.54 325.72 323.50 322.22 321.62
1970 324.89 325.82 326.77 327.97 327.91 327.50 326.18 324.53 322.93 322.90
1971 326.01 326.51 327.01 327.62 328.76 328.40 327.20 325.27 323.20 323.40
1972 326.60 327.47 327.58 329.56 329.90 328.92 327.88 326.16 324.68 325.04
1973 328.37 329.40 330.14 331.33 332.31 331.90 330.70 329.15 327.35 327.02
1974 329.18 330.55 331.32 332.48 332.92 332.08 331.01 329.23 327.27 327.21
1975 330.23 331.25 331.87 333.14 333.80 333.43 331.73 329.90 328.40 328.17
1976 331.58 332.39 333.33 334.41 334.71 334.17 332.89 330.77 329.14 328.78
1977 332.75 333.24 334.53 335.90 336.57 336.10 334.76 332.59 331.42 330.98
1978 334.80 335.22 336.47 337.59 337.84 337.72 336.37 334.51 332.60 332.38
1979 336.05 336.59 337.79 338.71 339.30 339.12 337.56 335.92 333.75 333.70
1980 337.84 338.19 339.91 340.60 341.29 341.00 339.39 337.43 335.72 335.84
1981 339.06 340.30 341.21 342.33 342.74 342.08 340.32 338.26 336.52 336.68
1982 340.57 341.44 342.53 343.39 343.96 343.18 341.88 339.65 337.81 337.69
1983 341.20 342.35 342.93 344.77 345.58 345.14 343.81 342.21 339.69 339.82
1984 343.52 344.33 345.11 346.88 347.25 346.62 345.22 343.11 340.90 341.18
1985 344.79 345.82 347.25 348.17 348.74 348.07 346.38 344.51 342.92 342.62
1986 346.11 346.78 347.68 349.37 350.03 349.37 347.76 345.73 344.68 343.99
1987 347.84 348.29 349.23 350.80 351.66 351.07 349.33 347.92 346.27 346.18
1988 350.25 351.54 352.05 353.41 354.04 353.62 352.22 350.27 348.55 348.72
1989 352.60 352.92 353.53 355.26 355.52 354.97 353.75 351.52 349.64 349.83
1990 353.50 354.55 355.23 356.04 357.00 356.07 354.67 352.76 350.82 351.04
1991 354.59 355.63 357.03 358.48 359.22 358.12 356.06 353.92 352.05 352.11
1992 355.88 356.63 357.72 359.07 359.58 359.17 356.94 354.92 352.94 353.23
1993 356.63 357.10 358.32 359.41 360.23 359.55 357.53 355.48 353.67 353.95
1994 358.34 358.89 359.95 361.25 361.67 360.94 359.55 357.49 355.84 356.00
1995 359.98 361.03 361.66 363.48 363.82 363.30 361.94 359.50 358.11 357.80
1996 362.09 363.29 364.06 364.76 365.45 365.01 363.70 361.54 359.51 359.65
1997 363.23 364.06 364.61 366.40 366.84 365.68 364.52 362.57 360.24 360.83
Nov Dec
1959 314.66 315.43
1960 314.84 316.03
1961 315.94 316.85
1962 316.53 317.53
1963 316.91 318.20
1964 317.53 318.55
1965 318.70 319.25
1966 319.63 320.87
1967 320.56 321.80
1968 321.16 322.74
1969 322.69 323.95
1970 323.85 324.96
1971 324.63 325.85
1972 326.34 327.39
1973 327.99 328.48
1974 328.29 329.41
1975 329.32 330.59
1976 330.14 331.52
1977 332.24 333.68
1978 333.75 334.78
1979 335.12 336.56
1980 336.93 338.04
1981 338.19 339.44
1982 339.09 340.32
1983 340.98 342.82
1984 342.80 344.04
1985 344.06 345.38
1986 345.48 346.72
1987 347.64 348.78
1988 349.91 351.18
1989 351.14 352.37
1990 352.69 354.07
1991 353.64 354.89
1992 354.09 355.33
1993 355.30 356.78
1994 357.59 359.05
1995 359.61 360.74
1996 360.80 362.38
1997 362.49 364.34
ts() data structure (2/2) Time-Series [1:92, 1:4] from 1918 to 2009: 174 174 174 174 174 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "Erie" "michHuron" "Ontario" "StClair"
Time Series:
Start = 1918
End = 2009
Frequency = 1
Erie michHuron Ontario StClair
1918 174.0150 176.8867 74.87250 174.9567
1919 174.1808 176.7450 74.95583 175.0767
1920 173.9083 176.6250 74.56750 174.8150
1921 174.0258 176.4883 74.66583 174.8692
1922 173.9275 176.4450 74.65667 174.8108
1923 173.7575 176.2642 74.44333 174.6525
1924 173.8433 176.1867 74.55750 174.6375
1925 173.5933 175.9192 74.39000 174.4058
1926 173.6408 175.8850 74.40250 174.4017
1927 173.8183 176.1483 74.60667 174.5983
1928 173.9517 176.4433 74.80083 174.8050
1929 174.3117 176.8958 75.08500 175.2642
1930 174.2700 176.6508 75.08083 175.0950
1931 173.7050 176.1183 74.35750 174.5083
1932 173.7358 175.9408 74.50250 174.5658
1933 173.6767 175.8675 74.27583 174.5467
1934 173.3317 175.7667 74.02167 174.2742
1935 173.4108 175.8908 74.00417 174.3608
1936 173.4883 175.9392 74.16417 174.3933
1937 173.8200 175.9225 74.49583 174.6033
1938 173.8408 176.1408 74.52250 174.6892
1939 173.8458 176.2692 74.49250 174.7208
1940 173.7950 176.1417 74.42583 174.6450
1941 173.7000 176.1217 74.40750 174.5817
1942 173.8767 176.3342 74.51667 174.7675
1943 174.2317 176.6267 75.13417 175.1125
1944 174.0917 176.5967 74.84583 174.9817
1945 174.1883 176.5700 75.06000 175.0892
1946 174.1358 176.6017 74.99333 175.0383
1947 174.2350 176.5667 75.16917 175.1042
1948 174.2208 176.5308 75.08083 175.0975
1949 173.9733 176.2108 74.74083 174.8317
1950 174.1350 176.2608 74.89167 174.9300
1951 174.3242 176.7358 75.24667 175.2408
1952 174.5200 177.0850 75.36583 175.4992
1953 174.3308 176.9333 75.01500 175.3233
1954 174.2883 176.8292 74.98167 175.2350
1955 174.3483 176.7225 75.14583 175.2517
1956 174.1200 176.4400 74.92167 174.9392
1957 174.0367 176.2633 74.63333 174.8800
1958 173.8083 176.0675 74.42833 174.6292
1959 173.8500 176.0058 74.47500 174.6567
1960 174.0942 176.4775 74.71250 174.9900
1961 174.1008 176.3767 74.61333 174.9633
1962 173.9075 176.2225 74.64167 174.7700
1963 173.7267 175.9225 74.65500 174.5600
1964 173.6075 175.6825 74.28667 174.3650
1965 173.7125 175.9158 74.39583 174.5333
1966 173.8992 176.1608 74.69000 174.7567
1967 174.0592 176.3008 74.85833 174.9250
1968 174.2042 176.4467 74.79917 175.0608
1969 174.3900 176.6958 74.74917 175.2617
1970 174.2583 176.6783 74.68583 175.1500
1971 174.3117 176.8050 74.72333 175.2692
1972 174.5000 176.8883 74.93500 175.4208
1973 174.7408 177.1233 75.21000 175.6500
1974 174.6892 177.0933 75.06583 175.6133
1975 174.6158 176.9733 74.81000 175.5158
1976 174.5733 176.8992 75.04417 175.4783
1977 174.2975 176.5050 74.75833 175.1992
1978 174.3725 176.5908 74.92083 175.2708
1979 174.3917 176.7942 74.84000 175.3558
1980 174.5217 176.8033 74.82333 175.4075
1981 174.3983 176.6983 74.78583 175.2958
1982 174.4250 176.5983 74.75250 175.2992
1983 174.5208 176.8333 74.82167 175.4333
1984 174.5225 176.8950 74.90167 175.4350
1985 174.7283 177.1267 74.87750 175.7000
1986 174.8983 177.2925 75.14750 175.8517
1987 174.6675 176.9700 74.84917 175.5608
1988 174.2617 176.5642 74.68000 175.1933
1989 174.2258 176.4008 74.76083 175.1092
1990 174.2942 176.3500 74.80417 175.1308
1991 174.3267 176.4692 74.87000 175.1975
1992 174.3567 176.4792 74.83250 175.2200
1993 174.5042 176.6958 75.04250 175.3808
1994 174.3667 176.6783 74.75500 175.2900
1995 174.2842 176.5275 74.73917 175.1833
1996 174.3750 176.6542 74.87333 175.2467
1997 174.7175 176.9842 74.96250 175.6125
1998 174.5508 176.7167 74.91000 175.3975
1999 174.1025 176.2358 74.57250 174.9758
2000 173.9900 175.9783 74.78000 174.7925
2001 173.9142 175.9508 74.69917 174.7617
2002 174.0583 176.1183 74.79000 174.8833
2003 173.9658 175.8917 74.71333 174.7375
2004 174.1158 176.1108 74.84833 174.9167
2005 174.1700 176.0900 74.82167 174.9383
2006 174.1467 176.0158 74.83083 174.8917
2007 174.1392 175.9433 74.75167 174.8483
2008 174.1592 176.0050 74.87500 174.9200
2009 174.2483 176.2583 74.87333 175.0767
The mean cost is: 10.
The SSE is: 36.
The MSE is: 3.
Is the mean a good estimate of the cost?
pc_ebt <- read_table(
"Year Earnings_M Mean Error Squared_Error
1985 46.163 48.676 -2.513 6.313
1986 46.998 48.676 -1.678 2.814
1987 47.816 48.676 -0.860 0.739
1988 48.311 48.676 -0.365 0.133
1989 48.758 48.676 0.082 0.007
1990 49.164 48.676 0.488 0.239
1991 49.548 48.676 0.872 0.761
1992 48.915 48.676 0.239 0.057
1993 50.315 48.676 1.639 2.688
1994 50.768 48.676 2.092 4.378", col_names = TRUE
)
pc_ebtThe mean Earnings ($M) is: 48.68.
The SSE is: 18.129.
The MSE is: 1.81.
Is the mean a good estimate of the Earnings?
quarterly_sales <- read_table(
"Year Quarter Period Sales
90 1 1 362
90 2 2 385
90 3 3 432
90 4 4 341
91 1 5 382
91 2 6 409
91 3 7 498
91 4 8 387
92 1 9 473
92 2 10 513
92 3 11 582
92 4 12 474
93 1 13 544
93 2 14 582
93 3 15 681
93 4 16 557
94 1 17 628
94 2 18 707
94 3 19 773
94 4 20 592
95 1 21 627
95 2 22 725
95 3 23 854
95 4 24 661", col_names = TRUE
)
quarterly_salesquarterly_sales_moving_avg <- quarterly_sales |>
mutate(
moving_avg_3 = slider::slide_dbl(Sales, mean, .before = 1, .after = 1, .complete = TRUE),
moving_avg_5 = slider::slide_dbl(Sales, mean, .before = 2, .after = 2, .complete = TRUE),
moving_avg_7 = slider::slide_dbl(Sales, mean, .before = 3, .after = 3, .complete = TRUE)
)
quarterly_sales_moving_avgquarterly_sales_moving_avg |>
ggplot(aes(x = Period)) +
geom_line(aes(y = Sales, color = "Original")) +
geom_line(aes(y = moving_avg_3, color = "3-Period MA")) +
geom_line(aes(y = moving_avg_5, color = "5-Period MA")) +
geom_line(aes(y = moving_avg_7, color = "7-Period MA")) +
geom_smooth(aes(y = Sales, color = "Trend"),
method = "lm", formula = y ~ x,
se = FALSE, linetype = "dashed") +
scale_color_manual(values = c(
"Original" = "black",
"3-Period MA" = "red",
"5-Period MA" = "blue",
"7-Period MA" = "green",
"Trend" = "purple"
)) +
labs(color = "Series") + # Legend title
theme_light()\[ S_t = \alpha \sum_{i=1}^{t-2} (1-\alpha)^{i-1} y_{t-i} + (1-\alpha)^{t-2} y_1 \, , \,\,\,\,\, t \ge 2 \, . \]
\[
\alpha \sum_{i=0}^{t-1} (1-\alpha)^i = \alpha \left[ \frac{1-(1-\alpha)^t}{1-(1-\alpha)} \right] = 1 - (1-\alpha)^t
\]
\[
S_t = \alpha y_{t-1} + (1-\alpha)S_{t-1} \,\,\,\,\,\,\, 0 < \alpha \le 1 \,\,\,\,\,\,\, t \ge 3
\]
We can see that the summation term shows that the contribution to the smoothed value becomes less at each consecutive time period.
—6.4.3.1 Single Exponential Smoothing
# Plot of alpha (1 - alpha)^t for different values of alpha at t = 1:5
alpha_values <- c(0.1, 0.3, 0.5, 0.7, 0.9)
t_values <- 1:5
alpha_table <- tibble()
for (alpha in alpha_values) {
alpha_table <- alpha_table |>
bind_rows(
tibble(
alpha = alpha,
t = t_values,
value = (1 - alpha)^t
)
)
}
alpha_table |>
ggplot(aes(x = t, y = value, color = factor(alpha))) +
geom_point() +
geom_line() +
labs(
x = "t",
y = expression((1 - alpha)^t),
color = "alpha"
) +
scale_y_continuous(breaks = seq(0, 1, 0.1)) +
theme_light()\[ S_t = \alpha x_t + (1 - \alpha) S_{t-1} \quad \text{for} \quad 0 \le \alpha \le 1 \]
where \(S_t\) is the smoothed value at time \(t\), \(x_t\) is the observed value at time \(t\), and \(\alpha\) is the smoothing parameter.
\[ S_t = \alpha x_t + (1 - \alpha) (S_{t-1} + b_{t-1}) \]
\[ b_t = \beta (S_t - S_{t-1}) + (1 - \beta) b_{t-1} \]
where \(b_t\) is the trend component at time \(t\), and \(\beta\) is the trend smoothing parameter.
\[ \begin{align} S_t &= \alpha \frac{x_t}{c_t - L} + (1 - \alpha) (S_{t-1} + b_{t-1}) \\ b_t &= \beta (S_t - S_{t-1}) + (1 - \beta) b_{t-1} \\ c_t &= \gamma \frac{x_t}{S_t} + (1 - \gamma) c_{t-L} \end{align} \]
Where \(L\) is the length of the seasonal cycle, \(c_t\) is the seasonal component at time \(t\), and \(\gamma\) is the seasonal smoothing parameter.
Fit below was using a single exponential smoothing model with alpha = 0.3
sample_data <- read_table(
"Data Fit
6.4 NA
5.6 6.4
7.8 6.2
8.8 6.7
11.0 7.3
11.6 8.4
16.7 9.4
15.3 11.6
21.6 12.7
22.4 15.4", col_names = TRUE
)
sample_data |>
ggplot() +
geom_line(aes(x = seq_along(Data), y = Fit), color = "blue", linetype = "dashed") +
geom_line(aes(x = seq_along(Data), y = Data), color = "black") +
theme_light()des_rjh <- function(x, alpha = 0.3, beta = 0.3) {
n <- length(x)
S_t <- numeric(n) # Initialize vector with zeros
b_t <- numeric(n) # Initialize vector with zeros
# Set first value
S_t[1] <- x[1]
b_t[1] <- x[2] - x[1]
# Calculate smoothed values
for(t in 2:n) {
S_t[t] <- alpha * x[t] + (1 - alpha) * (S_t[t-1] + b_t[t-1])
b_t[t] <- beta * (S_t[t] - S_t[t-1]) + (1 - beta) * b_t[t-1]
}
return(S_t + b_t)
}tes_rjh <- function(x, alpha = 0.3, b = 0.3, gamma = 0.3, m = 1) {
n <- length(x)
S_t <- numeric(n) # Initialize vector with zeros
b_t <- numeric(n) # Initialize vector with zeros
c_t <- numeric(n) # Initialize vector with zeros
# Set first value
S_t[1] <- x[1]
b_t[1] <- x[2] - x[1]
c_t[1] <- x[3] - x[2]
# Calculate smoothed values
for(t in 2:n) {
S_t[t] <- alpha * x[t] + (1 - alpha) * (S_t[t-1] + b_t[t-1])
b_t[t] <- b * (S_t[t] - S_t[t-1]) + (1 - b) * b_t[t-1]
c_t[t] <- gamma * (S_t[t] - S_t[t-1]) + (1 - gamma) * c_t[t-1]
}
return(S_t + b_t + c_t)
}quarterly_sales <- read_table(
"Year Quarter Period Sales
90 1 1 362
90 2 2 385
90 3 3 432
90 4 4 341
91 1 5 382
91 2 6 409
91 3 7 498
91 4 8 387
92 1 9 473
92 2 10 513
92 3 11 582
92 4 12 474
93 1 13 544
93 2 14 582
93 3 15 681
93 4 16 557
94 1 17 628
94 2 18 707
94 3 19 773
94 4 20 592
95 1 21 627
95 2 22 725
95 3 23 854
95 4 24 661", col_names = TRUE
)
quarterly_sales Series 1 Series 2 Series 3
[1,] -1.304420 0.04892015 0.48851542
[2,] 1.071498 1.15524546 0.25199163
[3,] 1.326470 1.18862356 -0.05942362
[4,] 1.393335 -0.26110484 0.94474235
[5,] -1.077372 0.55385374 0.80773964
[6,] -1.824166 0.26649029 -0.42073847
quarterly_sales_tsibble <- quarterly_sales |>
mutate(
# Convert 2-digit year to 4-digit year
Year = 1900 + Year,
# Create proper quarter dates by converting to months (Q1=1, Q2=4, Q3=7, Q4=10)
Month = (Quarter - 1) * 3 + 1,
# Create the yearquarter date
YearQuarter = yearquarter(paste(Year, Month, "01", sep = "-"))
) |>
# Select needed columns and convert to tsibble
select(YearQuarter, Sales) |>
as_tsibble(
index = YearQuarter
)
quarterly_sales_tsibbleGlobal Silicon Wafer Shipments
Units are in “million square inches (MSI)”
| year | quarter | msi |
|---|---|---|
| 2001 | Q1 | 1250 |
| 2001 | Q2 | 988 |
| … | … | … |
msi_ts <- msi_historical |>
pivot_longer(cols = -year, names_to = "quarter", values_to = "quarter_value", ) |>
separate_wider_delim(cols = quarter_value, names = c("quarter_2", "msi"), delim = " ") |>
mutate(msi = as.numeric(str_remove(msi, ","))) |>
select(-quarter_2) |>
mutate(year_quarter = str_c(year, " ", quarter)) |>
select(year_quarter, msi) |>
mutate(year_quarter = yearquarter(year_quarter)) |>
filter(!is.na(msi)) |>
as_tsibble(index = year_quarter)
msi_tsmy_theme <- theme(
plot.title = ggtext::element_markdown(size = 18, face = "bold"),
plot.title.position = "plot",
plot.subtitle = ggtext::element_markdown(size = 14),
axis.text.x = ggtext::element_markdown(size = 14),
axis.text.y = ggtext::element_markdown(size = 14),
plot.caption = ggtext::element_markdown(hjust = 0, margin = margin(t = 20)),
plot.caption.position = "plot"
)
msi_ts |>
autoplot(msi) +
labs(
x = NULL,
y = NULL,
title = "Global Silicon Wafer Shipments",
subtitle = "MSI (million square inches)",
caption = "Source: SEMI, October 2024"
) +
scale_y_continuous(labels = scales::comma) +
scale_x_yearquarter(date_labels = "%Y", date_minor_breaks = "1 year") +
theme_light() +
my_themefit_ets_msi <- msi_ts |>
model(model_ets = ETS(msi ~ error("A") + trend("A") + season("A")))
fit_ets_msi |>
forecast(h = 12) |>
autoplot(msi_ts) +
geom_line(aes(y = .fitted), data = augment(fit_ets_msi), color = "blue") +
labs(
x = NULL,
y = NULL,
title = "Global Silicon Wafer Shipments",
subtitle = "MSI (million square inches)",
caption = "Source: SEMI, October 2024"
) +
scale_y_continuous(labels = scales::comma) +
scale_x_yearquarter(date_labels = "%Y", date_minor_breaks = "1 year") +
theme_light() +
my_themefit_arima_msi <- msi_ts |>
model(model_arima = ARIMA(msi, stepwise = FALSE, approximation = FALSE))
fit_arima_msi |>
forecast(h = 12) |>
autoplot(msi_ts) +
geom_line(aes(y = .fitted), data = augment(fit_arima_msi), color = "blue") +
labs(
x = NULL,
y = NULL,
title = "Global Silicon Wafer Shipments",
subtitle = "MSI (million square inches)",
caption = "Source: SEMI, October 2024"
) +
scale_y_continuous(labels = scales::comma) +
scale_x_yearquarter(date_labels = "%Y", date_minor_breaks = "1 year") +
theme_light() +
my_thememsi_long_year <- msi_historical |>
pivot_longer(cols = -year, names_to = "quarter", values_to = "quarter_value", ) |>
separate_wider_delim(cols = quarter_value, names = c("quarter_2", "msi"), delim = " ") |>
mutate(msi = as.numeric(str_remove(msi, ","))) |>
select(-quarter_2) |>
group_by(year) |>
summarise(
msi = sum(msi, na.rm = TRUE)
) |>
ungroup()
msi_long_yearmsi_long_year_join |>
filter(year >= 2022) |>
ggplot(aes(x = year)) +
geom_col(aes(y = msi, fill = type)) +
labs(
x = NULL,
y = NULL,
title = "Global Silicon Wafer Shipments",
subtitle = "MSI (million square inches)",
caption = "Source: SEMI, October 2024",
) +
scale_y_continuous(labels = scales::comma) +
scale_fill_manual(values = c("grey", "lightblue"), name = NULL, labels = c("ARIMA", "SEMI")) +
guides(fill = guide_legend(reverse = TRUE)) +
theme_light() +
my_themeggplot() +
geom_line(aes(x = year, y = msi), data = arima_forecast, color = "grey", linewidth = 2) +
geom_line(aes(x = year, y = semi_foreast_msi), data = semi_forecast, color = "lightblue", linewidth = 2) +
geom_point(aes(x = year, y = msi),
data = arima_forecast, color = "grey", shape = 22, size = 4, stroke = 2, fill = "white") +
geom_point(aes(x = year, y = semi_foreast_msi),
data = semi_forecast, color = "lightblue", shape = 21, size = 4, stroke = 2, fill = "white") +
annotate("text", x = 2025, y = 14600, label = "ARIMA Forecast", hjust = 0, vjust = 0,
color = "grey", fontface = "bold") +
annotate("text", x = 2025.1, y = 13100, label = "SEMI Forecast", hjust = 0, vjust = 0,
color = "lightblue", fontface = "bold") +
scale_y_continuous(labels = scales::comma) +
labs(
x = NULL,
y = NULL,
title = "Global Silicon Wafer Shipments",
subtitle = "MSI (million square inches)",
caption = "Source: SEMI, October 2024",
) +
theme_light() +
my_theme Jan Feb Mar Apr May Jun Jul Aug Sep Oct
1959 315.42 316.31 316.50 317.56 318.13 318.00 316.39 314.65 313.68 313.18
1960 316.27 316.81 317.42 318.87 319.87 319.43 318.01 315.74 314.00 313.68
1961 316.73 317.54 318.38 319.31 320.42 319.61 318.42 316.63 314.83 315.16
1962 317.78 318.40 319.53 320.42 320.85 320.45 319.45 317.25 316.11 315.27
1963 318.58 318.92 319.70 321.22 322.08 321.31 319.58 317.61 316.05 315.83
1964 319.41 320.07 320.74 321.40 322.06 321.73 320.27 318.54 316.54 316.71
1965 319.27 320.28 320.73 321.97 322.00 321.71 321.05 318.71 317.66 317.14
1966 320.46 321.43 322.23 323.54 323.91 323.59 322.24 320.20 318.48 317.94
1967 322.17 322.34 322.88 324.25 324.83 323.93 322.38 320.76 319.10 319.24
1968 322.40 322.99 323.73 324.86 325.40 325.20 323.98 321.95 320.18 320.09
1969 323.83 324.26 325.47 326.50 327.21 326.54 325.72 323.50 322.22 321.62
1970 324.89 325.82 326.77 327.97 327.91 327.50 326.18 324.53 322.93 322.90
1971 326.01 326.51 327.01 327.62 328.76 328.40 327.20 325.27 323.20 323.40
1972 326.60 327.47 327.58 329.56 329.90 328.92 327.88 326.16 324.68 325.04
1973 328.37 329.40 330.14 331.33 332.31 331.90 330.70 329.15 327.35 327.02
1974 329.18 330.55 331.32 332.48 332.92 332.08 331.01 329.23 327.27 327.21
1975 330.23 331.25 331.87 333.14 333.80 333.43 331.73 329.90 328.40 328.17
1976 331.58 332.39 333.33 334.41 334.71 334.17 332.89 330.77 329.14 328.78
1977 332.75 333.24 334.53 335.90 336.57 336.10 334.76 332.59 331.42 330.98
1978 334.80 335.22 336.47 337.59 337.84 337.72 336.37 334.51 332.60 332.38
1979 336.05 336.59 337.79 338.71 339.30 339.12 337.56 335.92 333.75 333.70
1980 337.84 338.19 339.91 340.60 341.29 341.00 339.39 337.43 335.72 335.84
1981 339.06 340.30 341.21 342.33 342.74 342.08 340.32 338.26 336.52 336.68
1982 340.57 341.44 342.53 343.39 343.96 343.18 341.88 339.65 337.81 337.69
1983 341.20 342.35 342.93 344.77 345.58 345.14 343.81 342.21 339.69 339.82
1984 343.52 344.33 345.11 346.88 347.25 346.62 345.22 343.11 340.90 341.18
1985 344.79 345.82 347.25 348.17 348.74 348.07 346.38 344.51 342.92 342.62
1986 346.11 346.78 347.68 349.37 350.03 349.37 347.76 345.73 344.68 343.99
1987 347.84 348.29 349.23 350.80 351.66 351.07 349.33 347.92 346.27 346.18
1988 350.25 351.54 352.05 353.41 354.04 353.62 352.22 350.27 348.55 348.72
1989 352.60 352.92 353.53 355.26 355.52 354.97 353.75 351.52 349.64 349.83
1990 353.50 354.55 355.23 356.04 357.00 356.07 354.67 352.76 350.82 351.04
1991 354.59 355.63 357.03 358.48 359.22 358.12 356.06 353.92 352.05 352.11
1992 355.88 356.63 357.72 359.07 359.58 359.17 356.94 354.92 352.94 353.23
1993 356.63 357.10 358.32 359.41 360.23 359.55 357.53 355.48 353.67 353.95
1994 358.34 358.89 359.95 361.25 361.67 360.94 359.55 357.49 355.84 356.00
1995 359.98 361.03 361.66 363.48 363.82 363.30 361.94 359.50 358.11 357.80
1996 362.09 363.29 364.06 364.76 365.45 365.01 363.70 361.54 359.51 359.65
1997 363.23 364.06 364.61 366.40 366.84 365.68 364.52 362.57 360.24 360.83
Nov Dec
1959 314.66 315.43
1960 314.84 316.03
1961 315.94 316.85
1962 316.53 317.53
1963 316.91 318.20
1964 317.53 318.55
1965 318.70 319.25
1966 319.63 320.87
1967 320.56 321.80
1968 321.16 322.74
1969 322.69 323.95
1970 323.85 324.96
1971 324.63 325.85
1972 326.34 327.39
1973 327.99 328.48
1974 328.29 329.41
1975 329.32 330.59
1976 330.14 331.52
1977 332.24 333.68
1978 333.75 334.78
1979 335.12 336.56
1980 336.93 338.04
1981 338.19 339.44
1982 339.09 340.32
1983 340.98 342.82
1984 342.80 344.04
1985 344.06 345.38
1986 345.48 346.72
1987 347.64 348.78
1988 349.91 351.18
1989 351.14 352.37
1990 352.69 354.07
1991 353.64 354.89
1992 354.09 355.33
1993 355.30 356.78
1994 357.59 359.05
1995 359.61 360.74
1996 360.80 362.38
1997 362.49 364.34
co2 data (1/ )tsibbleco2 data (2/ )co2 data (3/ )co2 data (4/ )ts objectdco2 data (5/ )fit_ets_co2_current <- co2_current_data_ts |>
model(model_ets = ETS(value ~ error("A") + trend("A") + season("A")))
fit_ets_co2_current |>
forecast(h = 120) |>
autoplot(filter(co2_current_data_ts, index >= yearmonth("2000-01-1"))) +
scale_x_yearmonth(date_labels = "%Y") +
labs(
x = "Year",
y = "CO<sub>2</sub> Concentration (ppm)"
) +
theme_light() +
theme(axis.title.y = ggtext::element_markdown())Applied Statistical Techniques